Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
bioRxiv ; 2024 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-38168313

RESUMEN

Actinobacteria, the bacterial phylum most renowned for natural product discovery, has been established as a valuable source for drug discovery and biotechnology but is underrepresented within accessible genome and strain collections. Herein, we introduce the Natural Products Discovery Center (NPDC), featuring 122,449 strains assembled over eight decades, the genomes of the first 8490 NPDC strains (7142 Actinobacteria), and the online NPDC Portal making both strains and genomes publicly available. A comparative survey of RefSeq and NPDC Actinobacteria highlights the taxonomic and biosynthetic diversity within the NPDC collection, including three new genera, hundreds of new species, and ~7000 new gene cluster families. Selected examples demonstrate how the NPDC Portal's strain metadata, genomes, and biosynthetic gene clusters can be leveraged using genome mining approaches. Our findings underscore the ongoing significance of Actinobacteria in natural product discovery, and the NPDC serves as an unparalleled resource for both Actinobacteria strains and genomes.

2.
Nat Rev Drug Discov ; 22(11): 895-916, 2023 11.
Artículo en Inglés | MEDLINE | ID: mdl-37697042

RESUMEN

Developments in computational omics technologies have provided new means to access the hidden diversity of natural products, unearthing new potential for drug discovery. In parallel, artificial intelligence approaches such as machine learning have led to exciting developments in the computational drug design field, facilitating biological activity prediction and de novo drug design for molecular targets of interest. Here, we describe current and future synergies between these developments to effectively identify drug candidates from the plethora of molecules produced by nature. We also discuss how to address key challenges in realizing the potential of these synergies, such as the need for high-quality datasets to train deep learning algorithms and appropriate strategies for algorithm validation.


Asunto(s)
Inteligencia Artificial , Productos Biológicos , Humanos , Algoritmos , Aprendizaje Automático , Descubrimiento de Drogas , Diseño de Fármacos , Productos Biológicos/farmacología
3.
PLoS Comput Biol ; 19(2): e1010462, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36758069

RESUMEN

Microbial specialised metabolism is full of valuable natural products that are applied clinically, agriculturally, and industrially. The genes that encode their biosynthesis are often physically clustered on the genome in biosynthetic gene clusters (BGCs). Many BGCs consist of multiple groups of co-evolving genes called sub-clusters that are responsible for the biosynthesis of a specific chemical moiety in a natural product. Sub-clusters therefore provide an important link between the structures of a natural product and its BGC, which can be leveraged for predicting natural product structures from sequence, as well as for linking chemical structures and metabolomics-derived mass features to BGCs. While some initial computational methodologies have been devised for sub-cluster detection, current approaches are not scalable, have only been run on small and outdated datasets, or produce an impractically large number of possible sub-clusters to mine through. Here, we constructed a scalable method for unsupervised sub-cluster detection, called iPRESTO, based on topic modelling and statistical analysis of co-occurrence patterns of enzyme-coding protein families. iPRESTO was used to mine sub-clusters across 150,000 prokaryotic BGCs from antiSMASH-DB. After annotating a fraction of the resulting sub-cluster families, we could predict a substructure for 16% of the antiSMASH-DB BGCs. Additionally, our method was able to confirm 83% of the experimentally characterised sub-clusters in MIBiG reference BGCs. Based on iPRESTO-detected sub-clusters, we could correctly identify the BGCs for xenorhabdin and salbostatin biosynthesis (which had not yet been annotated in BGC databases), as well as propose a candidate BGC for akashin biosynthesis. Additionally, we show for a collection of 145 actinobacteria how substructures can aid in linking BGCs to molecules by correlating iPRESTO-detected sub-clusters to MS/MS-derived Mass2Motifs substructure patterns. This work paves the way for deeper functional and structural annotation of microbial BGCs by improved linking of orphan molecules to their cognate gene clusters, thus facilitating accelerated natural product discovery.


Asunto(s)
Productos Biológicos , Espectrometría de Masas en Tándem , Metabolómica , Bacterias/genética , Familia de Multigenes
5.
Nat Microbiol ; 7(5): 726-735, 2022 05.
Artículo en Inglés | MEDLINE | ID: mdl-35505244

RESUMEN

Bacterial specialized metabolites are a proven source of antibiotics and cancer therapies, but whether we have sampled all the secondary metabolite chemical diversity of cultivated bacteria is not known. We analysed ~170,000 bacterial genomes and ~47,000 metagenome assembled genomes (MAGs) using a modified BiG-SLiCE and the new clust-o-matic algorithm. We estimate that only 3% of the natural products potentially encoded in bacterial genomes have been experimentally characterized. We show that the variation in secondary metabolite biosynthetic diversity drops significantly at the genus level, identifying it as an appropriate taxonomic rank for comparison. Equal comparison of genera based on relative evolutionary distance revealed that Streptomyces bacteria encode the largest biosynthetic diversity by far, with Amycolatopsis, Kutzneria and Micromonospora also encoding substantial diversity. Finally, we find that several less-well-studied taxa, such as Weeksellaceae (Bacteroidota), Myxococcaceae (Myxococcota), Pleurocapsa and Nostocaceae (Cyanobacteria), have potential to produce highly diverse sets of secondary metabolites that warrant further investigation.


Asunto(s)
Cianobacterias , Streptomyces , Genoma Bacteriano/genética , Filogenia , Metabolismo Secundario/genética
6.
Gigascience ; 10(1)2021 01 13.
Artículo en Inglés | MEDLINE | ID: mdl-33438731

RESUMEN

BACKGROUND: Genome mining for biosynthetic gene clusters (BGCs) has become an integral part of natural product discovery. The >200,000 microbial genomes now publicly available hold information on abundant novel chemistry. One way to navigate this vast genomic diversity is through comparative analysis of homologous BGCs, which allows identification of cross-species patterns that can be matched to the presence of metabolites or biological activities. However, current tools are hindered by a bottleneck caused by the expensive network-based approach used to group these BGCs into gene cluster families (GCFs). RESULTS: Here, we introduce BiG-SLiCE, a tool designed to cluster massive numbers of BGCs. By representing them in Euclidean space, BiG-SLiCE can group BGCs into GCFs in a non-pairwise, near-linear fashion. We used BiG-SLiCE to analyze 1,225,071 BGCs collected from 209,206 publicly available microbial genomes and metagenome-assembled genomes within 10 days on a typical 36-core CPU server. We demonstrate the utility of such analyses by reconstructing a global map of secondary metabolic diversity across taxonomy to identify uncharted biosynthetic potential. BiG-SLiCE also provides a "query mode" that can efficiently place newly sequenced BGCs into previously computed GCFs, plus a powerful output visualization engine that facilitates user-friendly data exploration. CONCLUSIONS: BiG-SLiCE opens up new possibilities to accelerate natural product discovery and offers a first step towards constructing a global and searchable interconnected network of BGCs. As more genomes are sequenced from understudied taxa, more information can be mined to highlight their potentially novel chemistry. BiG-SLiCE is available via https://github.com/medema-group/bigslice.


Asunto(s)
Vías Biosintéticas , Familia de Multigenes , Vías Biosintéticas/genética , Genómica , Humanos , Metagenoma , Metabolismo Secundario
7.
Nat Prod Rep ; 38(1): 264-278, 2021 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-32856641

RESUMEN

Covering: 2010-2020The digital revolution is driving significant changes in how people store, distribute, and use information. With the advent of new technologies around linked data, machine learning and large-scale network inference, the natural products research field is beginning to embrace real-time sharing and large-scale analysis of digitized experimental data. Databases play a key role in this, as they allow systematic annotation and storage of data for both basic and advanced applications. The quality of the content, structure, and accessibility of these databases all contribute to their usefulness for the scientific community in practice. This review covers the development of databases relevant for microbial natural product discovery during the past decade (2010-2020), including repositories of chemical structures/properties, metabolomics, and genomic data (biosynthetic gene clusters). It provides an overview of the most important databases and their functionalities, highlights some early meta-analyses using such databases, and discusses basic principles to enable widespread interoperability between databases. Furthermore, it points out conceptual and practical challenges in the curation and usage of natural products databases. Finally, the review closes with a discussion of key action points required for the field moving forward, not only for database developers but for any scientist active in the field.


Asunto(s)
Productos Biológicos , Bases de Datos Factuales , Microbiología , Antibacterianos , Vías Biosintéticas/genética , Bases de Datos de Compuestos Químicos , Bases de Datos Farmacéuticas , Almacenamiento y Recuperación de la Información , Metabolómica , Familia de Multigenes
8.
Nucleic Acids Res ; 49(D1): D639-D643, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33152079

RESUMEN

Microorganisms produce natural products that are frequently used in the development of antibacterial, antiviral, and anticancer drugs, pesticides, herbicides, or fungicides. In recent years, genome mining has evolved into a prominent method to access this potential. antiSMASH is one of the most popular tools for this task. Here, we present version 3 of the antiSMASH database, providing a means to access and query precomputed antiSMASH-5.2-detected biosynthetic gene clusters from representative, publicly available, high-quality microbial genomes via an interactive graphical user interface. In version 3, the database contains 147 517 high quality BGC regions from 388 archaeal, 25 236 bacterial and 177 fungal genomes and is available at https://antismash-db.secondarymetabolites.org/.


Asunto(s)
Minería de Datos , Bases de Datos como Asunto , Enzimas/clasificación , Vías Biosintéticas/genética , Familia de Multigenes , Motor de Búsqueda
9.
Nucleic Acids Res ; 49(D1): D490-D497, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33010170

RESUMEN

Computational analysis of biosynthetic gene clusters (BGCs) has revolutionized natural product discovery by enabling the rapid investigation of secondary metabolic potential within microbial genome sequences. Grouping homologous BGCs into Gene Cluster Families (GCFs) facilitates mapping their architectural and taxonomic diversity and provides insights into the novelty of putative BGCs, through dereplication with BGCs of known function. While multiple databases exist for exploring BGCs from publicly available data, no public resources exist that focus on GCF relationships. Here, we present BiG-FAM, a database of 29,955 GCFs capturing the global diversity of 1,225,071 BGCs predicted from 209,206 publicly available microbial genomes and metagenome-assembled genomes (MAGs). The database offers rich functionalities, such as multi-criterion GCF searches, direct links to BGC databases such as antiSMASH-DB, and rapid GCF annotation of user-supplied BGCs from antiSMASH results. BiG-FAM can be accessed online at https://bigfam.bioinformatics.nl.


Asunto(s)
Vías Biosintéticas/genética , Bases de Datos Genéticas , Familia de Multigenes , Clostridium/genética , Motor de Búsqueda , Streptomyces/genética
10.
Nat Chem Biol ; 16(1): 60-68, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-31768033

RESUMEN

Genome mining has become a key technology to exploit natural product diversity. Although initially performed on a single-genome basis, the process is now being scaled up to mine entire genera, strain collections and microbiomes. However, no bioinformatic framework is currently available for effectively analyzing datasets of this size and complexity. In the present study, a streamlined computational workflow is provided, consisting of two new software tools: the 'biosynthetic gene similarity clustering and prospecting engine' (BiG-SCAPE), which facilitates fast and interactive sequence similarity network analysis of biosynthetic gene clusters and gene cluster families; and the 'core analysis of syntenic orthologues to prioritize natural product gene clusters' (CORASON), which elucidates phylogenetic relationships within and across these families. BiG-SCAPE is validated by correlating its output to metabolomic data across 363 actinobacterial strains and the discovery potential of CORASON is demonstrated by comprehensively mapping biosynthetic diversity across a range of detoxin/rimosamide-related gene cluster families, culminating in the characterization of seven detoxin analogues.


Asunto(s)
Actinobacteria/genética , Vías Biosintéticas/genética , Biología Computacional/métodos , Genoma Bacteriano , Algoritmos , Productos Biológicos , Análisis por Conglomerados , Minería de Datos/métodos , Genómica , Metabolómica , Microbiota , Familia de Multigenes , Filogenia , Reproducibilidad de los Resultados , Programas Informáticos
11.
Nucleic Acids Res ; 48(D1): D454-D458, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31612915

RESUMEN

Fueled by the explosion of (meta)genomic data, genome mining of specialized metabolites has become a major technology for drug discovery and studying microbiome ecology. In these efforts, computational tools like antiSMASH have played a central role through the analysis of Biosynthetic Gene Clusters (BGCs). Thousands of candidate BGCs from microbial genomes have been identified and stored in public databases. Interpreting the function and novelty of these predicted BGCs requires comparison with a well-documented set of BGCs of known function. The MIBiG (Minimum Information about a Biosynthetic Gene Cluster) Data Standard and Repository was established in 2015 to enable curation and storage of known BGCs. Here, we present MIBiG 2.0, which encompasses major updates to the schema, the data, and the online repository itself. Over the past five years, 851 new BGCs have been added. Additionally, we performed extensive manual data curation of all entries to improve the annotation quality of our repository. We also redesigned the data schema to ensure the compliance of future annotations. Finally, we improved the user experience by adding new features such as query searches and a statistics page, and enabled direct link-outs to chemical structure databases. The repository is accessible online at https://mibig.secondarymetabolites.org/.


Asunto(s)
Bases de Datos Genéticas , Genoma Bacteriano , Genómica/métodos , Familia de Multigenes , Programas Informáticos , Vías Biosintéticas/genética , Anotación de Secuencia Molecular
12.
Methods Mol Biol ; 1795: 173-188, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29846928

RESUMEN

Plants produce a vast diversity of specialized metabolites, which play important roles in the interactions with their microbiome, as well as with animals and other plants. Many such molecules have valuable biological activities that render them (potentially) useful as medicines, flavors and fragrances, nutritional ingredients, or cosmetics. Recently, plant scientists have discovered that the genes for many biosynthetic pathways for the production of such specialized metabolites are physically clustered on the chromosome within biosynthetic gene clusters (BGCs). The Plant Secondary Metabolite Analysis Shell (plantiSMASH) allows for the automated identification of such plant BGCs, facilitates comparison of BGCs across genomes, and helps users to predict the functional interactions of pairs of genes within and between BGCs based on coexpression analysis. In this chapter, we provide a detailed protocol on how to install and run plantiSMASH, and how to interpret its results to draw biological conclusions that are supported by the data.


Asunto(s)
Vías Biosintéticas/genética , Biología Computacional , Genoma de Planta , Genómica , Familia de Multigenes , Plantas/genética , Plantas/metabolismo , Biología Computacional/métodos , Genómica/métodos , Metabolismo Secundario , Programas Informáticos , Navegador Web
13.
Proc Natl Acad Sci U S A ; 114(29): E6005-E6014, 2017 07 18.
Artículo en Inglés | MEDLINE | ID: mdl-28673978

RESUMEN

Sesterterpenoids are a rare terpene class harboring untapped chemodiversity and bioactivities. Their structural diversity originates primarily from the scaffold-generating sesterterpene synthases (STSs). In fungi, all six known STSs are bifunctional, containing C-terminal trans-prenyltransferase (PT) and N-terminal terpene synthase (TPS) domains. In plants, two colocalized PT and TPS gene pairs from Arabidopsis thaliana were recently reported to synthesize sesterterpenes. However, the landscape of PT and TPS genes in plant genomes is unclear. Here, using a customized algorithm for systematically searching plant genomes, we reveal a suite of physically colocalized pairs of PT and TPS genes for the biosynthesis of a large sesterterpene repertoire in the wider Brassicaceae. Transient expression of seven TPSs from A. thaliana, Capsella rubella, and Brassica oleracea in Nicotiana benthamiana yielded fungal-type sesterterpenes with tri-, tetra-, and pentacyclic scaffolds, and notably (-)-ent-quiannulatene, an enantiomer of the fungal metabolite (+)-quiannulatene. Protein and structural modeling analysis identified an amino acid site implicated in structural diversification. Mutation of this site in one STS (AtTPS19) resulted in premature termination of carbocation intermediates and accumulation of bi-, tri-, and tetracyclic sesterterpenes, revealing the cyclization path for the pentacyclic sesterterpene (-)-retigeranin B. These structural and mechanistic insights, together with phylogenetic analysis, suggest convergent evolution of plant and fungal STSs, and also indicate that the colocalized PT-TPS gene pairs in the Brassicaceae may have originated from a common ancestral gene pair present before speciation. Our findings further provide opportunities for rapid discovery and production of sesterterpenes through metabolic and protein engineering.


Asunto(s)
Brassicaceae/genética , Brassicaceae/metabolismo , Genoma de Planta , Proteínas de Plantas/genética , Sesterterpenos/biosíntesis , Algoritmos , Transferasas Alquil y Aril/genética , Transferasas Alquil y Aril/metabolismo , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Dimetilaliltranstransferasa/genética , Dimetilaliltranstransferasa/metabolismo , Evolución Molecular , Mutación , Filogenia , Proteínas de Plantas/metabolismo , Plantas Modificadas Genéticamente , Sesterterpenos/genética , Nicotiana/genética , Nicotiana/metabolismo
14.
Nucleic Acids Res ; 45(W1): W36-W41, 2017 07 03.
Artículo en Inglés | MEDLINE | ID: mdl-28460038

RESUMEN

Many antibiotics, chemotherapeutics, crop protection agents and food preservatives originate from molecules produced by bacteria, fungi or plants. In recent years, genome mining methodologies have been widely adopted to identify and characterize the biosynthetic gene clusters encoding the production of such compounds. Since 2011, the 'antibiotics and secondary metabolite analysis shell-antiSMASH' has assisted researchers in efficiently performing this, both as a web server and a standalone tool. Here, we present the thoroughly updated antiSMASH version 4, which adds several novel features, including prediction of gene cluster boundaries using the ClusterFinder method or the newly integrated CASSIS algorithm, improved substrate specificity prediction for non-ribosomal peptide synthetase adenylation domains based on the new SANDPUMA algorithm, improved predictions for terpene and ribosomally synthesized and post-translationally modified peptides cluster products, reporting of sequence similarity to proteins encoded in experimentally characterized gene clusters on a per-protein basis and a domain-level alignment tool for comparative analysis of trans-AT polyketide synthase assembly line architectures. Additionally, several usability features have been updated and improved. Together, these improvements make antiSMASH up-to-date with the latest developments in natural product research and will further facilitate computational genome mining for the discovery of novel bioactive molecules.


Asunto(s)
Metabolismo Secundario/genética , Programas Informáticos , Algoritmos , Antibacterianos/biosíntesis , Productos Biológicos/metabolismo , Vías Biosintéticas/genética , Codón , Genes , Internet , Péptido Sintasas/metabolismo , Péptidos/química , Péptidos/metabolismo , Sintasas Poliquetidas/química , Dominios Proteicos , Procesamiento Proteico-Postraduccional , Terpenos/química
15.
Nucleic Acids Res ; 45(W1): W55-W63, 2017 07 03.
Artículo en Inglés | MEDLINE | ID: mdl-28453650

RESUMEN

Plant specialized metabolites are chemically highly diverse, play key roles in host-microbe interactions, have important nutritional value in crops and are frequently applied as medicines. It has recently become clear that plant biosynthetic pathway-encoding genes are sometimes densely clustered in specific genomic loci: biosynthetic gene clusters (BGCs). Here, we introduce plantiSMASH, a versatile online analysis platform that automates the identification of candidate plant BGCs. Moreover, it allows integration of transcriptomic data to prioritize candidate BGCs based on the coexpression patterns of predicted biosynthetic enzyme-coding genes, and facilitates comparative genomic analysis to study the evolutionary conservation of each cluster. Applied on 48 high-quality plant genomes, plantiSMASH identifies a rich diversity of candidate plant BGCs. These results will guide further experimental exploration of the nature and dynamics of gene clustering in plant metabolism. Moreover, spurred by the continuing decrease in costs of plant genome sequencing, they will allow genome mining technologies to be applied to plant natural product discovery. The plantiSMASH web server, precalculated results and source code are freely available from http://plantismash.secondarymetabolites.org.


Asunto(s)
Genes de Plantas , Genoma de Planta , Programas Informáticos , Vías Biosintéticas/genética , Enzimas/genética , Perfilación de la Expresión Génica , Genómica , Internet , Anotación de Secuencia Molecular , Plantas/genética , Plantas/metabolismo , Transcriptoma
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...